Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition
نویسندگان
چکیده
Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of 1D character sequence predicted by visual recognition (VR) model, which ignore 2D spatial context semantics within and between instances, making them not generalize well arbitrary shape scene text. To address this issue, we make first attempt perform textual reasoning based on in paper. Technically, given segmentation maps VR construct subgraph for each instance, where nodes represent pixels it edges are added their similarity. Then, these subgraphs sequentially connected root merged into complete graph. Based graph, devise graph convolutional network (GTR) supervising with cross-entropy loss. GTR can be easily plugged representative STR models improve performance owing better reasoning. Specifically, our namely S-GTR, paralleling segmentation-based baseline, effectively exploit visual-linguistic complementarity via mutual learning. S-GTR sets new state-of-the-art six challenging benchmarks generalizes multi-linguistic datasets. Code is available at https://github.com/adeline-cs/GTR.
منابع مشابه
Exploiting Color Information for Better Scene Text Recognition
The problem of scene text recognition has gained significant importance because of its numerous applications. A variety of methods has been recently proposed that explore various theoretical and practical aspects to solve this problem. In this work, we focus towards a framework to recognize the text present in outdoor scene images. The text information carries one important property, that is, i...
متن کاملExploiting Colour Information for Better Scene Text Recognition
This paper presents an approach to text recognition in natural scene images. The main contribution of this paper is the efficient exploitation of colour information for the identification of text regions in the presence of surrounding noise. We propose a pipeline of image processing operations involving the bilateral regression for the identification of characters in the images. A pre-processin...
متن کاملFrame Semantics in Text-to-Scene Generation
3D graphics scenes are difficult to create, requiring users to learn and utilize a series of complex menus, dialog boxes, and often tedious direct manipulation techniques. By giving up some amount of control afforded by such interfaces we have found that users can use natural language to quickly and easily create a wide variety of 3D scenes. Natural language offers an interface that is intuitiv...
متن کاملTowards Text Recognition in Natural Scene Images
In this paper, we propose a novel methodology for text detection in natural scene images. The proposed methodology is based on an efficient binarization and enhancement technique followed by a suitable connected component analysis procedure. Image binarization successfully processes natural scene images having shadows, non-uniform illumination, low contrast and large signaldependent noise. Conn...
متن کاملAttentive Visual Recognition for Scene Exploration
Vision is an active process where behaviorally important information is selectively gathered. We present a model of scene exploration in which the high-resolution fovea is deployed to interesting regions by selective attention processes. The system switches between exploratory and recognition modes such that in exploration mode, regions of interest are investigated and in recognition mode, a gi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i1.19971